Introduction

This page is a index to the supplementary materials for the article “Tsi in Yiddish”. ##Data used for the research

 For this project, I used different sources for different languages. For all the languages except Ukrainian, the initial sources were texts. Texts were consequently proceeded with a python script.

For Ukrainian language we used GRAC corpus: [Maria Shvedova, Ruprecht von Waldenfels, Sergiy Yarygin, Mikhail Kruk, Andriy Rysin, Michał Woźniak (2017-2018): GRAC: General Regionally Annotated Corpus of Ukrainian. Electronic resource: Kyiv, Oslo, Jena. Available at uacorpus.org].

 From there, I extracted questions using the following CQL-query:

<s> []{1,15} [word =="?"]

 Then, I wrote a python script to extract question particles, such as А, Чи, Невже from the questions. A sample output:

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
AUTHOR TITLE QWORD QUOTE Birthreg POETRY GENERIC_QWD NEGATIVE RHET
Segalovich Antosha-novels QWORD װעמען פארטײדיקט ער ? NA QWORD NA NA
Borokhov Ber Geklibene Shriften 1905-1914 CZY צי קען די עמיגראַציאָנס-אָרגאַניזאַציע זײן רײן-דעמאָקראַטיש ? UA-CRK NA CZY NA NA
Segalovich Antosha-novels NOQWORD —לערן זיך,——האָט אים אַ חבר געראַמען—געלט האָסטו, צײט —איצטער זיך לערנען? NA NOQWORD NA NA
Sforim Letzte shriften 1904-1917 QWORD אַ פאָהר אַהין אַ פאָהר צוריק, אַײן עס, אַ טרונק קאָסט דאָך געלד, און װוּ זענען װײבּער ? NA QWORD NA NA
Markish Dor oys dor eyn 1929 FIC QWORD בּא יעדערן זיך אָפּגעשטעלט, ארײנגעקוקט אינעװײניק אין שטױבּיק שרײענדיקע אינגעװײדן פון אָט דעם נײעם מין גאָלעס, געטײלט <quote>גוט י מאָרגנס<quote> די אָרעמע װאַנדערער און זיך פאַנאַנדערגעפרעגט : פונװאַנען, אידן ? NA QWORD NA NA
A. Reisen Dertseylungen QWORD אום האַרבםט, װען דער װינט פלעגט בלאָזן, פלעגן דעמאָלט די צװײגן זיך שאָקלען און רױשן טרויעריק אין פענצטער אַריין, װי זײ װאָלטן עפעם אָן אומעטיקע לאָנגאָנענדיקע מײםע װעלן כאָ? BY-MSQ NA QWORD NA NA
Varshavski Oyzer Shmuglars 1920 QWORD —גײ שױן, גרונם, װאָס איז עס פאַר אַ לשון? NA QWORD NA NA
Markish Dor oys dor eyn 1929 FIC QWORD גזאָמער װאָס ? NA QWORD NA NA
Vergelis Oyg oyf Oyg collection POE NOQWORD זאָט דעו געקאָנט עש דשאַליו פאַרתאַלשו ן זאָט דעו געקאָנט עע זשאַליו צעשפּאַלטו ? YPOETRY NOQWORD NA NA
Markish Der Trot fun Doyres 1947 FIC QWORD ס’איז אים עפּעס געװאָרן טײַערער; ער האָט פונדאָסנײַ באטראכט דעם באלקן, די ערטערװײַז שימלענדיקע װענט און צװישן אײן װאנט און דער צװײטער ארײַנגעגאנװעט א קוק אפן קװארטיראנט אלײן, װאָס האָט אים איצט אױסגעזען א סאך גרעסער, װי ער איז, און פארמאכנדיק דעם בוך מיט אן איבערגעלאָזטן פינגער אינװײניק, האָט ער א זאָג געטאָן שטראלנדיק: — הײסט עס, אף איבערקערן די װעלט מאכט ניט אױס, אז מע איז א ייִִד און דע-גלײַכן? NA QWORD YNEG NA
Markish Dor oys dor eyn 1929 FIC NOQWORD ס’װאָלט דיר ניט געפאסט זײן קײן זאַװאָדטשיק, מײנסטו, האַ ? NA NOQWORD YNEG NA
Segal Kalman Getraye Libe 1960 FIC QWORD האָסט געזען אַמאָל אַ קלײן קינד, װאָס איז נרשט געבױרן גערואָרן? NA QWORD NA NA
A. Reisen Dertseylungen QWORD — מיינער, װאָם הײםט מי״יגער? BY-MSQ NA QWORD NA NA
Sutskever Valdiks 1937-1939 POE CZY צי איז דײַן פּנים אײנערלײ מיט מײַנעם? YPOETRY CZY NA NA
Spektor Mordechai Elnte un Farshtosene CZY-dep װער װײסט צי זי װעט מיך שױן גאָר אין שטובּ אַרײנלאָזן און װוּ אַהין-זשע האָבּ איך דען צו גײן, אַז נישט צו „דער פלײשיקער ? NA CZY-dep YNEG YRHET
A. Reisen Naye Verk NOQWORD בּרוקירען ? NA NOQWORD NA NA
Sholem Ash Motiven QWORD — נו , װײַס ניט נאָט װאָס ער טוט ? NA QWORD YNEG NA
Spektor Mordechai Shmad un Fertsvayflung QWORD — פאַניע סטאַרשינע,—האָט דבורה זיך אָנגערופן צו אים מיט אַירחמנות רחמנות פנים, װי גלײך דאָס גרעסטע אומגליק האָט זי געטראָפן, - אפשר װאָלסטו אים אָפּגעלאָזט פרײ ? NA QWORD NA NA
Varshavski Oyzer Shmuglars 1920 QWORD זעען יענע ייִדן, װי די חברה װאַקסט, נעמען זײ זי * דיקן אױף הינטער און ס’געפינען זיך שױן אַ פּאָר בּעליבּתישע מענטשלאַך, װאָס טענהן : —צי װאָס האָט דאָס געטױגט ? NA QWORD NA NA
Sholem Blondzhe Stern QWORD <quote>זיץ, מאַכט צו אים דער פּאָרעץ, זיץ, װאָס שטײסטו ? NA QWORD NA NA
Dinezon Zikrones nybc 207361 QWORD מאַכט זיך אַמאָל, עס עפנט בּײ אים אַ שטיװל דאָס מױל, אַ גאָט רײסט זיך, אָדער זײנע אַרבּל האָבּן זיך אױסגעריבּן און דער אונטערשלאַק הײבּט זיך אָן ארױסצוּװײזן, זאָגט בּרוך : — צו װאָס האָט גאָט, בּרוך הוא, בּאַשאַפן אַ נאָדל-פאָדם מיט לאַטקעלעך ? NA QWORD NA NA
Ettinger Serkele QWORD װאָס פֿעלט אײַך, װאָס? NA QWORD NA NA
Dinezon Alter QWORD — װאָס-זשע טוט מען פאָרט, ער זאָל יענע פאַרגעסן? NA QWORD NA NA
Linetski Dos poylishe yungel 1867 NOQWORD האַ ? NA NOQWORD NA NA
Markish Milkhome Stalingrad POE QWORD איז װעלכער װינט האָט זיך ניט אָפּגעשטעלט אַליע ? YPOETRY QWORD YNEG NA
Spektor Mordechai Yiddishe Tekhter QWORD װאָס טוט מען גישבּ פון אַ קינדס װעגן ? NA QWORD NA NA
Dinezon Alter QWORD זעט ער זי, קוקט זי אַלעמאָל אױף אים, נאָר װאָס איז דאָפּ זײן דאגה ? NA QWORD NA NA
Perets Geklibene verk nybc 209359 QWORD — װאָס װעל איך דאָרט טאָן ? NA QWORD NA NA
Perets Geklibene verk nybc 209359 NOQWORD פּערע ? NA NOQWORD NA NA
Eliezer Steinbarg Mayselekh kindertales QWORD האַלט דער נאָװי זיך מיטן פעדערל אין מױל און װײסט ניט, נעבעך, װאָס מיט אים צו טון : אַװעקװאַרפן ? NA QWORD YNEG NA
Linetski Der Pritshepe 1876 QWORD — נאָר דיא קשיא, װאָס נאָך קען מען אױפטיהן פֿאַר גײעס אין אחשורוש-שפיעל, װאָס אַללע פורים שפיעלער האָבען שון בין אַהער ניט אױפֿגעטיהן ? NA QWORD YNEG NA
Kaczerginski Grine legende stories 1943 QWORD װי האָבן זײ זיך צוזאמענגערעדט? NA QWORD NA NA
Vergelis Notizen vegn a seyder SCI CZY צי איז ער גיט ניכשל געװאָרן פון די קאטױלישע פּאָסטולאטן? NA CZY NA NA
Ester Kreitman Briliantn 1944 FIC NOQWORD װײניק שידוכים, אסתר קרייטמאַן מײנט איר, האָב איך אים שױן אַלײן גערעדט ? NA NOQWORD NA NA
Sholem Aleykhem lat Tevye NOQWORD — azoy? NA NOQWORD NA NA
Vergelis Di tsayt FIC 1981 QWORD — װען פאָרט איר אין ביראָבידזשאנער ראיאָן און אף װאָס פאר א שטעלע? NA QWORD NA NA
Sutskever Yidishe Gas 1941-1947 POE NOQWORD און װעדליק צײט ? YPOETRY NOQWORD NA NA
Eliezer Steinbarg Mesholim POE NOQWORD נײן ? YPOETRY NOQWORD NA NA
Sforim di takse 1869 QWORD װאס איז איהר האט אױך אַ כבוד ? NA QWORD NA NA
Linetski Dos poylishe yungel 1867 QWORD אײ פֿון װאס ? NA QWORD NA NA
Spektor Mordechai Soydes QWORD גאָר װאָס װעל איך אײך גײן דערצײלן? NA QWORD NA NA
Fefer Rotarmeyish 1944 POE QWORD װער װײסט ? YPOETRY QWORD NA NA
Segalovich Antosha-novels QWORD װוּ איז דאָס אַהינגעקומען ? NA QWORD NA NA
Vergelis Reyzes NOQWORD — אראָפּנעמען דעם כײרעם, אין װעלכן מע האָט שפּינאָזען ארײַנגעלײגט מיט דרײַ הונדערט יאָר צוריק? NA NOQWORD NA NA
Markish Der Trot fun Doyres 1947 FIC NOQWORD פארשטײסט? NA NOQWORD NA NA
Perets Geklibene verk nybc 209359 CZY צי ליגט נישט אין אים דער שמערץ פאַר אַלע, דאָס לײדן פאַרן גאַנצן דאָר ? NA CZY YNEG NA
Sforim Letzte shriften 1904-1917 NOQWORD אבער קידוש מאַכט רב יודעל ? NA NOQWORD NA NA
Horonczyk In geroysh fun mashinen NOQWORD ער האָט לאנג דך געשלאָגן מיט דער דעה, צי ער זאָל פרעגן, און ערשע נאָך אַ לאַנגן איבּערלײגן זיך, האָט ער אַ פרעג געטאָן: — און די פאַרטײ אַרבּעט — פירט אָן סעמערל ? NA NOQWORD NA NA
A. Reisen Dertseylungen NOQWORD — אַן אומגליק? BY-MSQ NA NOQWORD NA NA
Dinezon Tsvey mames QWORD — פאַרװאָס ? NA QWORD NA NA

Comments on the data files used: raw dataset (with the quetsion text string)

 Below is the information about data files that were used to perform this research: both original data file produced by a python script (extract shown above) and the summary tables used for further plots (find interactive plots below).
Author: The writer of a given text.

Title: The title of a given text.

Genre: The genre of a text. Fiction texts represent the main share of our database, while poetry is excluded from it completely.

Quote The question sentence extracted from text or, for Ukrainian, GRAC corpus

Qword,GENERIC_QWD Type of the question string, as determinded by the python script and then, in case of Yiddish material, manually alternated. For Yiddish, this variable is used as a “raw” one, not used for quantitative analysis, but rather as a index for the whole variability of the questions. To simplify the quantitative analysis, all the options available for QWORD variable were merged into GENERIC_QWD variable.

Find more information about the GENERIC_QWD variable types:

 1.QWORD - Any type of Wh-Word (‘how’, ‘where’, ‘why’ and so on) is found in the question string. When working with Yiddish data, if another question particle (tsi, czy a so on) is present there, the question is assigned to CZYQWORD group and then manually put to CZY or QWORD groups. For other languages this procedure was not performed and, as for now, CZYQWORD group is merged with QWORD.

 2.NOQWORD - Question without any question particle (counted in our analysis).

 3.CZY - One instance of Чи particle found in the question string.

 4.CZY-dep - Tsi is used as a complement clause marker .<->  5.CZY-or - Tsi is used as a disjunctive connector.

POETRY: Question belongs to a poetry text: NA - Not Poetry, YPOETRY - Poetry ** NEGATIVE:** Negation found in the question, that is, strings " ניט“* and *” נישט " (nit, and nisht,accordingly) found in the question string (QUOTE). NA - no negation, YNEG - negation found.

RHET The question is likely to contain rhetoric semanthics, that is, strings *" דען “, ” דענ “, ” טאַקע “* ," טאקע “,” טאַקי“, טאַ”ק טאַקיי“”(representing rhetorical particles take and den in varous spelling variants) are found in the text.NA - not attested, YRHET - rhetorical particles found.

The main metrics: czyperc.

 For different parts of our analysis, we used two slightly different ways to calculate czyperc metrics.

 First, for parts focusing on Yiddish language we defined czyperc as the percent of CZY questions (excluding CZY-dep, and CZY-or subtypes) as these subtypes were manually labeled in the Yiddish part of the database. The formula for it is:

\[\frac{CZY}{\sum{NOQWORD + CZYdep + CZYor + CZY}}\]

 Second way to define czyperc was used for Slavic languages and Yiddish in comparison with them. As the types of “tsi”-kind particle were not labeled in this part of database, all types of CZY were counted together. The formula for this czyperc is:

\[\frac{CZY+CZYdep+CZYor}{\sum{NOQWORD + CZYdep + CZYor + CZY}}\]

Datafile by writer

The “raw” dataset was used to produce summarizing table used for further analysis. The general formula for that was:

In the example above a condition is used to exclude poetical works from the table(POETRY!=“YPOETRY”). Poetry texts were proceeded separately from the fiction texts of the same authors and marked as a distinct author (for example, Sutskever and Sutskever POE)
The same actions were applied to “raw” dataset for other languages. For Slavic languages and Yiddish in comparison with them the resulting table is presented below:


For Yiddish-centered part of research the table was more complete, icluding different czyperc scores for CZY-type question czypercDO - the same metrics calculated for CZY-dep and CZY-or questions:

For analysis of rhetoric quesions a cut version of the table above was used - poetry texts were excluded.

Click here to see

Illustrations from the article

Figure 1

Distribution of tsi as Yes/No question particle in dialects of Yiddish amongwriters. Region of birth11 and date of birth given.

figure1 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"], aes(x=Dialect, y=czyperc, color =Dialect)) + geom_boxplot() + geom_text(aes(label=paste(rn, Birthreg,Birthdate, sep = "-")))
ggplotly(figure1)


Figure 2

Diachronic development of tsi as a question particle usage.

figure2 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czyperc,color = Dialect)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'lm',se = FALSE) + ylim(0,0.25) + stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czyperc], color ="general"), method = "lm",se = FALSE,size=1,fill="black",colour="black")
ggplotly(figure2)


Figure 3

Diachronic development of tsi as disjunctive a complement clause marker.

figure3 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czypercDO,color = Dialect)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'lm',se = FALSE) + ylim(0,0.25) + stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czypercDO], color ="general"), method = "lm",se = FALSE,size=1,fill="black",colour="black")
ggplotly(figure3)


Figure 4

Diachronic development of different tsi uses: Blue line – czypercDO for a given birthdate, red line – czyperc for a birthdate

figure4 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=Birthdate, y=czypercDO)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method = 'loess',se = FALSE) + ylim(0,0.25) +     stat_smooth(aes(x = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",Birthdate], y = writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber",czyperc], color = "red"), method = "loess", se = FALSE) + scale_shape_discrete(name="czypercDO vs CZYPERC", breaks = c("colour"))
ggplotly(figure4)


Figure 5

Figure 5: Tsi as a Yes/No question particle (czyperc) vs. tsi in other functions (czypercDO).

figure5 <- ggplot(writerstable_ypoe[POETRY!="YPOETRY"&Birthdate!="NA"&rn!="Borokhov Ber"], aes(x=czypercDO, y=czyperc)) + geom_point() + geom_text(aes(label=paste(rn,Birthdate,sep ="-")))+ stat_smooth(method = 'lm')   + ylim(0,0.25)

ggplotly(figure5)


Figure 6

Figure 6: Tsi as Yes/No question particle in poetry (YPOETRY) and fiction (NA).

figure6 <-  ggplot(writerstable_ypoe[rn%in%c("Markish","Markish POE", "Sutskever", "Sutskever POE","Eliezer Steinbarg","Eliezer Steinbarg POE", "Fininberg", "Fininberg POE", "J. Glatstein", "J. Glatstein POE", "Vergelis", "Vergelis POE", "Perets", "Perets POE", "Kulbak", "Kulbak POE")], aes(x=POETRY, y=czyperc)) + geom_boxplot() + geom_text(aes(label=paste(rn, Birthreg,Birthdate,paste("ALLSUM:",ALLSUM,sep = ""), sep = "-"))) 
ggplotly(figure6)


Figure 7

Figure 7: Diachronic development of Yes/No question tsi with rhetoric semanthics. Blue curve –percent of rhetorical questions with tsi (rhetperc); red curve – percent of rhetorical questions with tsi in other functions or without a particle (rhetpercNOQWORD); black curve – general czyperc rate.

figure7 <- ggplot(writerstable_with_rhet[rhetsum>8],aes(x=Birthdate, y=rhetperc,colour=Dialect,group = 1)) + geom_point() + geom_text(aes(label=rn)) + stat_smooth(method='loess', se = F) + stat_smooth(aes(x = writerstable_with_rhet[rhetsum>8,Birthdate], y = writerstable_with_rhet[rhetsum>8,rhetpercNOQWORD]), method = "loess",se = FALSE,size=1,fill="black",colour="red") + stat_smooth(aes(x = writerstable_with_rhet[rhetsum>8,Birthdate], y = writerstable_with_rhet[rhetsum>8,czyperc]), method = "loess",se = FALSE,size=2,colour="black")

ggplotly(figure7)


Figure 8

“tsi”-kind particles in all functions in selected languages.

figure8 <- ggplot(writerstable_belyidpolukr[Birthdate>1800&POETRY=="NA"], aes(x=LANG, y=czyperc,color = LANG)) + geom_boxplot() + geom_jitter(aes(label=rn), alpha = 0.8)
## Warning: Ignoring unknown aesthetics: label
ggplotly(figure8)


Figure 9

Diachronic development of “tsi”-kind particles in all functions in selected languages.

figure9 <- ggplot(writerstable_belyidpolukr[Birthdate!="NA"&POETRY=="NA"&rn!="Bruno Schulz"&rn!="Borokhov Ber"&LANG%in%c("YID","BEL","PL","UKR")], aes(x=as.numeric(as.character(Birthdate)), y=as.numeric(as.character(czyperc)),color = LANG)) + geom_text(data=writerstable_belyidpolukr[Birthdate!="NA"&POETRY=="NA"&rn!="Bruno Schulz"&rn!="Borokhov Ber"&LANG%in%c("YID","PL","BEL")],aes(label=rn),check_overlap = TRUE) + stat_smooth(method = 'loess',se=F) + ylim(0,0.5)  + labs(x = "Birthdate", y = "czyperc")


ggplotly(figure9)